3B-L3 Stereo Correspondence

Intro

Until now we’ve defined epipolar geometry which relates views from two cameras geometrically and if a point in one view is known it is solely a one-dimensional search for that point in the other image if the relative position of the two scenes is known.

To make life easy make some simplifying assumptions:

  1. Coplanar image planes (parallel focal axes).
  2. Identical focal lengths.
  3. Horizontal and parallel epipolar lines such that y doesn’t change from image to image.

Correspondence

The correspondence problem: Multiple match hypotheses satisy the epipolar constraint but which is correct?

The verged case

The verged case

We need some additional constraints. The epipolar constraint must be true as a result of camera geometry but there are other soft constraints that are likely going to be true:

  • Similarity: A pixel in the left image will look like it’s sister pixel in the right image.
  • Uniqueness: The relationship between the two images is such that a pixel on the left will match only one pixel on the right.
  • Ordering: Pixels lined up A, B, C in the left image will be lined up A, B, C in the right image.
  • The Disparity Gradient is Limited: The depth doesn’t change quickly.

To find matches in the pairs assume:

  • Most points visible in one view are visible in both.
  • Image regions for the matches are similar in appearance.

Dense Correspondence Search: For every pixel / window in the left image compare it with every pixel / window on the same epipolar line in the right image and pick the position with the minimum match cost (pick your flavor of sum of squared difference, normalized correlation, etc…)

This is a notional image of the disparity measured via sum of squared difference across the scan line.

Suppose one image is slightly brighter because the gains are off; direct subtraction, because of the scaling intensity, will produce error and might not produce the best match. You can do normalized correlation with SSD and you’ve enforced a similarity constraint.

Quiz: Find the Best Match

import cv2
import numpy as np
from cs6476helpers import *
left = cv2.imread("images/18_04_01.PNG")
right = cv2.imread("images/18_04_02.PNG")
imshow(left)
imshow(right)
left_gray = cv2.cvtColor(left, cv2.COLOR_BGR2GRAY)
right_gray = cv2.cvtColor(right, cv2.COLOR_BGR2GRAY)
patch_loc = (119, 169)
patch_size = (100, 100)
patch_left = left_gray[patch_loc[0]:(patch_loc[0] + patch_size[0]), patch_loc[1]:(patch_loc[1] + patch_size[1])]
imshow(patch_left)
strip_right = right_gray[patch_loc[0]:(patch_loc[0] + patch_size[0]),:]
imshow(strip_right)
def find_best_match(patch,strip):
  rows = strip.shape[1] - patch.shape[1]
  errors = []
  for i in range(rows):
    ssd = np.sum((patch - strip[:,i:(i+patch.shape[1])])**2)
    errors.append(ssd)
  errors = np.asarray(errors)
  best_x = np.argmin(errors)
  return(best_x)
best_x = find_best_x(patch_left,strip_right)
patch_right = strip_right[:,best_x:(best_x+patch_size[1])]
imshow(patch_right)

Correspondence Problem

Two images with a horizontal epipolar line where things aren’t quite so clear cut. There will be ambiguities and false positives!

Following the same idea of sliding along the epipolar line searching for a match yields pretty good results when we do cross correlation!

What happens when the point falls in an area where the image is kind of textureless (white wall!).

Well, crap. That’s a lot of peaks. How do we justify picking one? How do we fix this? Our problem can be resolved by thinking about picking a larger window.

Quiz: Good Regions to Match

Which regions are good for stereo matching?

Basically anywhere with unique textures! Discard anything that’s too dark or too light.

Effect of Window Size

How should we choose an effective window size? There’s no easy answer.

For a small window size in this image we can match branches pretty well but everything else is quite poor! Increasing the window size matches the ground pattern quite well but the tree branches lose performance.

Occlusion

Beyond the hard constraint let’s revisit the soft constraints of uniqueness and order.

Uniqueness says that there’s no more than one match, but why no more than. Why not one? Occlusion!

In this image you can see that as the sight line slides from right to left the green bar occluded portions of the right bar and different portions are visible in each image.

(These are half occluded because they are visible in one image. Fully occluded is not visible in either image.)

This is why there is rarely an exact one for one match.

Ordering Constraint

Typically a single solid surface will follow the ordering constraint. What’s not a single solid?

Transparent surfaces with markings on them!

More commonly a narrow occluding surface will cause this.

Quiz: Match Two Strips

import cv2
import numpy as np
from cs6476helpers import *

left = cv2.imread("images/18_04_01.PNG")
right = cv2.imread("images/18_04_02.PNG")

left_gray = cv2.cvtColor(left, cv2.COLOR_BGR2GRAY)
right_gray = cv2.cvtColor(right, cv2.COLOR_BGR2GRAY)

y = 119
b = 100

strip_left = left_gray[y:y+b,:]
imshow(strip_left)

strip_right = right_gray[y:y+b,:]
imshow(strip_right)

def disparity(strip_left, strip_right, b):
  num_blocks = strip_left.shape[1]//b
  disparity = []
  for i in range(num_blocks):
    x_left = i * b
    patch_left = strip_left[:,x_left:(x_left + b)]
    x_right = find_best_match(patch_left,strip_right)
    disparity.append(x_left-x_right)
  return disparity

disparity(strip_left,strip_right,b)

Stereo Results

A couple of slightly more sophisticated methods exist for solving the ordering problem. The previous methods are called ‘window-based matching’

Optimize correspondence assignments jointly. This can be done a scanline at a time or on a full 2D grid.

We want the best set.

Dynamic Programming Formula

The image plots a scanline from the left against a scanline on the right. A solution mapping the left image to the right image is represented by a single path through the image from the far left point on the left scanline to the far right point on the right scanline. The best path is the best match and is the best because it is the lowest cost.

There are three different path type possibilities. If your path travels directly down the diagonal then it’s a one for one match! The disparity doesn’t change and if the disparity at the farthest left part is +3 then it is +3 throughout.

If, like in the image, a range of the y axis is mapped to a single x value that is a left occlusion while if a range of x is mapped to a singular y that is a right occlusion.

The goal is to find the least cost through. Going diagonally the cost is a match between two pixels; when navgating around occlusion you must pay some extra cost. There are various forms of solving this but we will solve it with dynamic programming. You have to be able to compute the cost of traveling horizontally, vertically, or diagonally.

Coherent Stereo on 2D Grid

Looking at the sample image agan we see that we have a much better representation of depth and occlusion. There are some artifacts and there is streaking because each scanline is done independently. You cannot use dynamic programming to find a spatially coherent set of disparities over a 2D grid. Other ways exst.

What defines a good stereo correspondence?

  • Match quality: We want each pixel to find a good appearance match in the other image.
  • Smoothness: If two pixels are adjacent they should usually move around the same amount. They should have around the same disparity.

We might want to penalize something that has lots of jumps.

This example solves via energy minimization. The image on the right is the disparity and the energy terms are:

  • Data term: \(E_{data} = \sum_i (W_1(i) - W_2(i+D(i)))^2\), which is saying that summed over all the pixels the difference between the two windows should be as low as possible.
  • Smoothness term: \(E_{smooth} = \sum_{neighbors-{i,j}} \rho(D(i)-D(j))\), where it’s ok to make small changes but the price grows quickly. The whole term maxes out to prevent explosions.
  • Total Energy: \(E= \alpha E_{data}(I_1,I_2,D) + \beta E_{smooth} (D)\)

Find a set of coeffcients to minimize the disparity funciton.

Better Results and Challenges

Energy functions of this form can be minimized using graph cuts.

It works pretty well, but it’s not without issues.

There are some challenges wth stereo, like textureless areas, occlusions, and violations of brightness constancy, and really large baselines. Camera calibration issues are bad for epipolar lines, too!

End